Anamorphic codes

ABSTRACT

The error tolerance of an array of m storage units is increased by using a technique referred to as “dodging.” A plurality of k stripes are stored across the array of storage units in which each stripe has n+r elements that correspond to a symmetric code having a minimum Hamming distance d=r+1. Each respective element of a stripe is stored on a different storage unit. An element is selected when a difference between a minimum distance of the donor stripe and a minimum distance of a recipient stripe is greater or equal to 2. The selected element is also stored on a storage unit having no elements of the recipient stripe. A lost element of the recipient stripe is then rebuilt on the selected element.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is related to patent application Ser. No. ______(Attorney Docket No. ARC9-2003-0015-US1), entitled “Automatic ParityExchange”, patent application Ser. No. ______ (Attorney Docket No.ARC9-2003-0016-US1), entitled “Multi-path Data Retrieval From RedundantArray,” and patent application Ser. No. ______ (Attorney Docket No.ARC9-2003-0040-US1), entitled “RAID 3+3” each co-pending, co-assignedand filed concurrently herewith, and each incorporated by referenceherein. The present application is also related to co-pending andco-assigned patent application Ser. No. ______ (Attorney Docket No.YOR9-2003-0069-US1), which is also incorporated by reference herein.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to storage systems. In particular, thepresent invention relates to a method for configuring an array ofstorage units for increasing the number of storage-unit failures thatthe array can tolerate without loss of data stored on the array.

2. Description of the Related Art

The following definitions are used herein and are offered for purposesof illustration and not limitation:

An “element” is a block of data on a storage unit.

A “base array” is a set of elements that comprise an array unit for anECC.

An “array” is a set of storage units that holds one or more base arrays.

A “stripe” is a base array within an array.

n is the number of data units in the base array.

r is the number of redundant units in the base array.

m is the number of storage units in the array.

d is the minimum Hamming distance of the base array.

D is the minimum Hamming distance of the array.

In a conventional array, the number of storage units in the array equalsthe number of data units in a base array plus the number of redundantunits in the base array. That is, m=n+r. Most conventional storagearrays use a Maximum Distance Separation (MDS) code, such as parity, ora mirroring technique for tolerating failures. The minimum Hammingdistance of the base array using an MDS code equals one plus the numberof redundant units in the base array (i.e., d=1+r). For a mirrorconfiguration, the number of redundant units in the base array equalsthe number of data units in the base array (r=n=1), and the minimumHamming distance is d=2.

It is possible to anamorphically encode an array over m storage units,which is greater than the number of data units n in the array plus thenumber of redundant units r in the array, that is, m>n+r. In theliterature, when an anamorphical encoding is used for arranging parityblocks for performance, such an encoding is typically referred to“de-clustering parity.” As used herein, such an encoding scheme isreferred to as an anamorphic encoding scheme because it more accuratelyidentifies that the encoding scheme can provide new properties for anarray.

Anamorphism is achieved by selectively arranging a set of base arrayswithin an array. For example, consider the exemplary array 200 shown inFIG. 2 that uses a four-element code. Array 200 includes six storageunits D1-D6 depicted in a columnar form. For array 200, m=6. Array 200also includes several base arrays that are each formed from n data unitsplus r redundant units. That is, for each base array, n+r=4. Therespective base arrays are numbered sequentially as stripes 1-3 in FIG.2 to indicate that the four-element code of array 200 is spread acrossstorage units D1-D6. There are four blocks in each stripe and eachstripe acts as an independent base array. The minimum distance of thearray is, accordingly, the minimum of all the minimum Hamming distancesof the respective stripes, that is, D=min(d_(i)), where D1 is theminimum distance of stripe i.

As configured, anamorphic array 200 can tolerate the loss of at least rstorage units of a set of m storage units without loss of data, insteadof exactly r units from a set of n storage units. Thus, if r=2 and thecode used is MDS, then any two storage units can fail without loss ofdata. A stripe will fail if any three of its elements are lost. Thereare, however, some combinations of three-unit failures that can betolerated by anamorphic array 200. For example, if storage units D1, D3and D5 each fail, two elements of stripe 1, two elements of stripe 2 andtwo elements of stripe 3 are lost, but no stripe has lost threeelements. Anamorphic array 200 is, thus, over-specified and may beadvantageously exploited.

What is needed is a technique that enhances the minimum Hamming distanceof an ECC when it is used with an anamorphic array of storage units, andthereby increases the effective minimum distance of the array.

BRIEF SUMMARY OF THE INVENTION

The present invention provides a technique that enhances the minimumHamming distance of an ECC that is used with an anamorphic array ofstorage units, thereby increasing the minimum distance of the array.

J The advantages of the present invention are provided by a firstembodiment that is a method and a system for increasing an errortolerance of an anamorphic array of m storage units in which k stripesare stored across the array of m storage units. Each stripe has n+relements that correspond to a symmetric code having a minimum Hammingdistance d=r+1 and in which n is a number of data storage units in abase array of the array of m storage units and r is a number ofredundant units in the base array. Additionally, n=r, n≧2, m>n+r,jm=k(n+r), and j and k are integers. Each respective element of a stripeis stored on a different storage unit. An element is selected when adifference between a minimum distance of a donor stripe and a minimumdistance of a recipient stripe is greater or equal to 2. The selectedelement is rebuilt on a storage unit having no elements of the recipientstripe. Before the lost element is rebuilt, the storage units storingthe donor stripe are made aware that the selected element has beendonated so that data is not read from or written to the selected elementas part of the donor stripe. A lost element of the recipient stripe isthen rebuilt on the selected element. According to the presentinvention, the minimum Hamming distance of the recipient stripe is d≧1before the element in the donor stripe is selected. The selected elementof donor stripe can be further selected based on a minimal performanceimpact on the array. Additionally, the recipient information is selectedbased on an improved performance of the array. The array of the storagesystem includes redundancy based on an erasure or error correcting code,such as a parity code, a Winograd code, a symmetric code, a Reed-Solomoncode, an EVENODD code, or a derivative of an EVENODD code.Alternatively, the array includes redundancy based on a product of aplurality of erasure or error correcting codes in which at least one ofthe erasure or error correcting codes is a parity code, a Winograd code,a symmetric code, a Reed-Solomon code, an EVENODD code, or a derivativeof an EVENODD code.

When an element in the donor stripe fails during the step of rebuildingat least a portion of recipient information from the recipient stripe onthe selected element, the step of rebuilding at least a portion ofrecipient information from the recipient stripe on the selected elementis terminated. A second donor stripe is selected from the plurality ofstripes when a difference between a minimum distance of the second donorstripe and a minimum distance of the second recipient stripe is greateror equal to 2. A donor element is selected in the second donor stripe.At least a portion of lost recipient information from the recipientstripe is rebuilt on the selected element in the second donor stripe.When a spare element becomes available, the spare element is assigned toa selected storage unit.

A second embodiment of the present invention provides a method and asystem for increasing the failure tolerance of an array of m storageunits that is vulnerable to selected patterns of failures. According tothis embodiment of the present invention, k stripes are stored acrossthe array of m storage units. The array of m storage units is ananamorphic array. Each stripe has n+r elements in which n is the numberdata elements in the base array, r is the number of redundant elementsin the base array, m>n+r, jm=k(n+r), and j and k are integers. Eachstripe has a plurality of elements, and each stripe forms an erasure orerror correcting code having a minimum Hamming distance d. Eachrespective element of a stripe is stored on a different storage unit.Subsequent to an element failure, a recipient element is selected. Anelement in a donor stripe is selected such that a failure tolerance ofthe array is increased following a rebuild operation. A lost element ofthe recipient stripe is rebuilt on the selected element of the donorstripe. The minimum Hamming distance of the recipient stripe is d≧2before the element is selected in the donor stripe. Moreover, theminimum Hamming distance of the array is increased upon completion ofrebuilding the recipient stripe on the selected element of the donorstripe. The recipient element can be selected based on a failure patternof the array. Additionally, the donor element can be selected based on apredetermined target pattern. The storage units storing the donor stripeare made aware that the selected element has been donated before thelost element of the recipient stripe is rebuilt on the selected element.The array of storage units includes redundancy based on an erasure orerror correcting code, such as a parity code, a Winograd code, asymmetric code, a Reed-Solomon code, an EVENODD code or a derivative ofan EVENODD code. Alternatively, the array of storage units includesredundancy based on a product of a plurality of erasure or errorcorrecting codes, such that at least one of the erasure or errorcorrecting codes is a parity code, a Winograd code, a symmetric code, aReed-Solomon code, an EVENODD code or a derivative of an EVENODD code.

A third embodiment of the present invention provides a method and asystem for increasing an error tolerance of a storage system having aplurality of arrays of storage units, such that each array includes mstorage units and k stripes are stored across each respective array of mstorage units. Each stripe has n+r elements corresponding to a symmetriccode having a minimum Hamming distance d and in which n is a number ofstorage units in a base array of the array of m storage units and r is anumber of redundant units in the base array. Additionally, n=r, n≧2,m>n+r, jm=k(n+r), and j and k are integers. Each respective element of astripe is stored on a different storage unit in the array. An element isselected in a donor stripe when a difference between a minimum distanceof the donor stripe and a minimum distance of a recipient stripe isgreater or equal to 2. The selected element is stored on a storage unithaving no elements of the recipient stripe. The donor stripe can bestored on an array that is different from the array of the recipientstripe. Alternatively, the donor stripe can be stored on the same arrayas the recipient stripe. Before the lost element is rebuilt, the storageunits storing the donor stripe are made aware that the selected elementhas been donated so that data is not read from or written to theselected element as part of the donor stripe. A lost element of therecipient stripe is then rebuilt on the selected element. According tothe present invention, the preferred minimum Hamming distance of therecipient stripe is d≧2 before the element in the donor stripe isselected. The selected element of donor stripe can be further selectedbased on a minimal performance impact on the donor stripe or based on aminimal performance impact on the storage system. Additionally, therecipient information is selected based on an improved performance ofthe recipient stripe or based on an improved performance of the storagesystem. The array of the storage system includes redundancy based on anerasure or error correcting code, such as a parity code, a Winogradcode, a symmetric code, a Reed-Solomon code, an EVENODD code, or aderivative of an EVENODD code. Alternatively, the array includesredundancy based on a product of a plurality of erasure or errorcorrecting codes in which at least one of the erasure or errorcorrecting codes is a parity code, a Winograd code, a symmetric code, aReed-Solomon code, an EVENODD code, or a derivative of an EVENODD code.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not bylimitation in the accompanying figures in which like reference numeralsindicate similar elements and in which:

FIG. 1 a shows a typical configuration of a storage system with aplurality of arrays connected to a common storage controller;

FIG. 1 b shows a typical configuration of a storage system with aplurality of arrays connected to separate storage controllers;

FIG. 2 depicts an exemplary anamorphic array having six storage units;

FIG. 3 depicts an exemplary anamorphic array having nine storage unitsfor illustrating the benefits of a dodging operation according to thepresent invention;

FIG. 4 depicts a first three-storage-unit-failure arrangement of theanamorphic array depicted in FIG. 3;

FIG. 5 depicts a second three-storage-unit-failure arrangement of theanamorphic array depicted in FIG. 3;

FIG. 6 depicts the array of FIG. 5 after performing a dodging operationaccording to the present invention;

FIG. 7 depicts a third three-storage-unit-failure arrangement of theanamorphic array depicted in FIG. 3;

FIG. 8 depicts the array of FIG. 7 after performing a dodging operationaccording to the present invention;

FIG. 9 depicts an exemplary array having eight storage units using a(3+3) symmetric code;

FIG. 10 depicts a system having three eight-storage-unit arrays;

FIG. 11 depicts the array system of FIG. 10 after performing an externaldodging operation according to the present invention; and

FIG. 12 depicts a ten storage unit array with three failed storage unitsfollowing a set of dodging operations.

DETAILED DESCRIPTION OF THE INVENTION

The minimum number of failures of an array that can be tolerated by anErasure or Error Correcting Code (ECC), such as a parity code, aWinograd code, a symmetric code, a Reed-Solomon code, an EVENODD code,or a derivative of an EVENODD code, is at least the minimum Hammingdistance d of the ECC minus one, that is, d−1. The present inventionenhances the minimum Hamming distance of an ECC by utilizing anoperation, referred to herein as a “dodging” operation, for providing an“effective distance” that is greater than the Hamming distance of theECC. Thus, the number of failures that an array can tolerate, whether afailure is a device failure or a hard error, is increased beyond theminimum distance provided by the ECC. As used herein, the terms“effective distance” and “effective minimum distance” refer to one plusthe number of failures that can be tolerated by an array utilizing oneor more dodging operations according to the present invention.

According to the present invention, a dodging operation is a process inwhich a stripe within an array is selected for donating an element to arecipient stripe, and recipient information is rebuilt onto the donatedelement, thereby increasing the minimum distance of the array. A dodgingoperation can be performed on a pair of stripes (i,j) when the distanced_(i)° d_(j)+2. After the dodging operation, the donor stripe will dropin distance by 1. In contrast, the recipient stripe will increase indistance by 1. When a dodging operation can be performed on all stripesthat are at the minimum array distance, then the overall minimum arraydistance will be increased. A dodging operation can occur at varyingdistances depending on the configuration of the array.

FIG. 1 a shows an exemplary storage system, indicated generally as 100,comprising two storage arrays 102 and 103 that are connected to a commonarray controller 101. Storage arrays 102 and 103 comprise multiplestorage units 104 and communicate with array controller 101 overinterface 105. Array controller 101 communicates to other controllersand host systems over interface 106. Such a configuration allows anarray controller to communicate with multiple storage arrays.

FIG. 1 b shows an exemplary storage system, indicated generally as 150,comprising two storage arrays 153 and 154, that are respectivelyconnected to different array controllers 152 and 151. Storage array 153communicates with array controller 152 over interface 157, and storagearray 154 communicates with array controller 151 over interface 156.Array controllers 151 and 152 respectively communicate with other arraycontrollers and storage systems over interfaces 158 and 159. Also shownin FIG. 1 b is a communication connection 160 that allows arraycontrollers 151 and 152 to communicate with each other.

The array controllers shown in FIGS. 1 a and 1 b may be designed ashardware or software controllers. The term controller will be usedherein generally to refer to any of the configurations described above.

Many anamorphic arrays can benefit from dodging. The ability of ananamorphic array to benefit from a dodging operation can be verified byinspection of failure combinations for the array. For example, FIG. 3shows an exemplary anamorphic array 300 having nine storage units D1-D9.For this example, array 300 uses a (3+3) symmetric code that is MDS withn=3, r=3, and d=4. Array 300 is arranged to have three redundantelements that can correct any three failed elements from the sixelements in a stripe. Three unique stripes, respectively indicated as 1,2 and 3, are arranged within array 300. Any three storage-unit failureswill not affect more than three elements of any one stripe. Thus, array300 has a minimum distance D=4.

To illustrate that a dodging operation according to the presentinvention can be performed to increase the effective minimum distance ofarray 300, there are only three arrangements of three-storage-unitfailures that need consideration. The first three-storage-unit-failurearrangement is shown in FIG. 4, in which an X within a block indicates astorage-unit failure. In this particular failure arrangement, two blocksof each stripe 1-3 have failed. Consequently, each stripe still has aminimum distance d=2 and, therefore, array 300 has a minimum distanceD=2. Thus, array 300, as shown in FIG. 4, can tolerate one furtherstorage-unit failure without possible data loss.

A second three-storage-unit-failure arrangement is shown in FIG. 5 inwhich an X within a block indicates a storage-unit failure. In thesecond three-storage-unit-failure arrangement, stripe 1 has lost threeelements, stripe 2 has lost two elements and stripe 3 has lost oneelement. Thus, stripe 1 has minimum distance d=1 and can tolerate nofurther storage-unit failures without data loss. Stripe 2 has minimumdistance d=2, and stripe 3 has minimum distance d=3. Consequently, array300 has minimum distance D=1.

A dodging operation can be performed for the secondthree-storage-unit-failure arrangement by rebuilding the contents of oneof the lost elements in stripe 1 in a well-known manner onto one of thenon-failed elements of stripe 3. FIG. 6 depicts the array of FIG. 5after performing a dodging operation. Rebuilt data in FIG. 6 isunderlined. Here, the element of stripe 3 on unit D9 has been donated tostripe 1 Following the dodging operation, all stripes have minimumdistance d=2 and, therefore, array 300 has minimum distance D=2. Theconfiguration of the array after the dodging operation can now tolerateone further failure without data being lost.

A third three-storage-unit-failure arrangement is shown in FIG. 7 inwhich an X in a block indicates a storage-unit failure. In the thirdthree-storage-unit-failure arrangement, stripes 1 and 2 have each lostthree elements, and have minimum distance d=1. Stripe 3, however, hasnot lost any elements and, consequently, has minimum d=4. A dodgingoperation can be performed for the third three-storage-unit-failurearrangement by rebuilding the contents of one lost element from each ofstripes 1 and 2 (d=1) in a well-known manner onto different elements ofstripe 3 (d=4). For example, the contents of element 1 on storage unitD1 can be rebuilt onto element 3 of disk D9, and the contents of element2 on disk D1 rebuilt onto element 3 of disk D4.

FIG. 8 depicts the array of FIG. 7 after a dodging operation isperformed. Rebuilt data in FIG. 8 is underlined. The result will againbe that all stripes have minimum distance d=2 and, consequently, array300 has minimum distance D=2. After the dodging operation, it isimportant that no storage unit contains two elements from the samestripe. That is, none of the elements of stripe 3 stored on storageunits D4-D6 are selected for rebuilding any of the lost elements ofstripe 1 because each of storage units D4-D6 already contain an elementof stripe 1. Similarly, none of the elements of stripe 3 stored onstorage units D7-D9 are selected for rebuilding any of the lost elementsof stripe 2.

Dodging thus provides a technique to restore the minimum distance ofarray 300 of FIG. 3 from D=1 to D=2 after any three storage-unitfailures. Moreover, the effective minimum distance of array 300 has beenincreased from d=4 to d=5, even though the system of array 300 still hasthe write performance of a code having distance d=4.

The smallest anamorphic array for a (3+3) code in which the effectivedistance is increased is an array of nine storage units. All arrays thatare larger than nine storage units will also have the property of anincreased effective distance when a (3+3) code is used. An array ofeight storage units for a (3+3) code does not have the property ofhaving an increased effective distance. A (4+4) code utilized overtwelve storage units also has the property of an increased effectivedistance. A dodging operation, however, should occur when the minimumarray distance is d=2. Consequently, a further storage-unit failureduring the dodging operation can be tolerated.

According to the invention, a dodging operation can be performed withinan array, between separate arrays, or between an array and a spare pool,referred to herein as an external dodging operation. While a dodgingoperation performed to a spare pool is possible, usually it is better toperform a complete rebuild operation onto a spare and to perform adodging operation only when the spare pool is exhausted. This is becausea dodging operation rebuilds only some elements from a failed storageunit onto a donated storage unit, while a sparing operation rebuilds allof the elements from a failed storage unit onto a spare storage unit.

An external dodging operation is different from a parity-exchangeoperation, such as disclosed by co-pending application Ser. No. ______(Attorney Docket No. ARC9-2003-0015-US1), and which is incorporated byreference herein. That is, a dodging operation is performed on a stripebasis, while a parity-exchange operation is performed on a storage unitbasis.

To illustrate the advantages provided by an external (array-to-array)dodging operation according to the present invention, consider anexemplary array 900 shown in FIG. 9. Array 900 includes eight storageunits D1-D8 and uses a (3+3) symmetric code. Array 900 also includesfour stripes, indicated by the numerals 1-4. Each stripe has sixelements and has distance d=4. As previously mentioned, a dodgingoperation that is internal to an array of eight storage units that usesa (3+3) code, such as array 900, does not increase the effective minimumHamming distance of the array because there are too few remainingnon-failed stripes to compensate for the number of stripes that areaffected by multiple unit failures.

In contrast, consider an exemplary array system 1000, shown in FIG. 10,comprising three eight-storage-unit arrays 1001-1003. Specifically,array 1001 includes storage units D1-D8, array 1002 includes storageunits D9-D16, and array 1003 includes storage units D17-D24. Eachrespective array 1001-1003 also includes four stripes arranged withinthe array. For example, array 1001 includes stripes 1-4, array 1002includes stripes 5-8, and array 1003 includes stripes 9-12.

Suppose that after any three storage-unit failures, a parity-exchangeoperation, such as disclosed by co-pending application Ser. No. ______(Attorney Docket No. ARC9-2003-0015-US1), is used to ensure that eacharray 1001-1003 has one failed storage unit. The results of aparity-exchange operation are depicted, for example, by storage unitsD1, D9 and D17 being shown having Xs within the blocks of the storageunit. Further suppose that a fourth storage-unit failure occurssubsequently to the parity-exchange operation. A fourth storage-unitfailure is depicted, for example, by storage unit D2 in array 1001 beingshown having Xs within the blocks of storage unit D2. After the fourthstorage-unit failure, arrays 1001, 1002 and 1003 respectively havedistances D=(2, 3, 3). It should be understood that another storage unitother than storage unit D2 could fail as the fourth storage-unit failureand a procedure that is similar to the procedure described below wouldbe performed for increasing the effective distance of the storagesystem.

Stripes 1, 2 and 3 in array 1001 are now only distance d=2. A dodgingoperation that is internal to array 1001 will fail because at leastthree elements having distance d=4 are required for increasing theminimum distance of array 1001 from 2 to 3. Only stripe 4 in array 1001has distance d=4. Nevertheless, an external dodging operation betweenarrays allows the minimum distance of array 1001 to be increased from 2to 3 without changing the minimum distances of arrays 1002 and 1003. Toachieve this, the contents for one of the missing elements of each ofstripes 1-3 are rebuilt in a well-known manner onto elements of otherstripes that are still at distance d=4, such as stripe 4 on array 1001,stripe 8 on array 1002 and stripe 12 on array 1003.

FIG. 11 depicts array system 1000 of FIG. 10 after performing anexternal dodging operation according to the present invention. Rebuiltdata in FIG. 11 is underlined. Specifically, an element of stripe 3 isselected to be rebuilt onto stripe 4 within storage unit D3. An elementof stripe 2 is selected to be rebuilt onto stripe 8 within storage unitD10. Lastly an element of stripe 1 is selected to be rebuilt onto step12 within storage unit D17.

When an element is selected in a donor stripe, the storage unitcontaining the selected element cannot contain an element of therecipient stripe. For example, an element of stripe 4 contained ineither storage unit D7 or storage unit D8 can be selected for rebuildinga failed element from stripe 1 because both storage units D7 and D8contain no elements of stripe 1. Similarly, an element of stripe 4contained in either storage unit D5 or storage unit D6 can be selectedfor rebuilding a failed element from stripe 2 because both storage unitsD7 and D8 contain no elements of stripe 2. Lastly, an element of stripe4 contained in either storage unit D3 or storage unit D4 can be selectedfor rebuilding a failed element from stripe 3 because both storage unitsD3 and D4 contain no elements of stripe 3. For purposes of this example,suppose that an element of stripe 4 contained in storage unit D3 isselected to be a donor element for rebuilding an element of stripe 3.

Any one of the elements of stripe 8, having d=4, could be selected forrebuilding a failed element of any one of remaining stripes, each havingd=2, and any one of the elements of stripe 12, having d=4, could beselected for rebuilding a failed element of the last remaining stripehaving d=2. For purposes of this illustrative example, suppose that anelement of stripe 8 contained in storage unit D11 is selected to be adonor element for rebuilding an element of stripe 2, and that an elementof stripe 12 contained in storage unit D19 is selected to be a donorelement for rebuilding an element of stripe 1. For each of the dodgingselections of this illustrative example prior to the dodging operation,a donor stripe had distance d=4, and a recipient stripe (i.e., stripes1-3) had distance d=2.

The net result of the external dodging operation is that array system1000 has minimum distance D=3 after four failures. In contrast, theminimum distance would have been only 2 using only a parity-exchangeoperation, such as disclosed by co-pending patent application Ser. No.______ (Attorney Docket No. ARC9-2003-0015-US1). Consequently, when anexternal dodging operation is utilized for array system 1000, fivefailures are required for array system 1000 to have a minimum distanceof d=2. This is the same result for an array system of 24 units that arearranged as four arrays of six units and in which each array uses only aparity-exchange operation as disclosed by co-pending patent applicationSer. No. ______ (Attorney Docket No. ARC9-2003-0015-US1). Thus, adodging operation in combination with a parity-exchange operationprovides that system reliability is independent of the arrayconfiguration.

The process of a generalized dodging operation in combination with aparity-exchange operation can continue with each further storage-unitfailure. Failed elements are rebuilt onto surviving elements such thatthe minimum distance for each array is maximized, as described above. Inthe exemplary 24-unit array system of FIGS. 9 and 10, eight storage-unitfailures are required for a minimum distance of d=2 for all of thearrays of the system. Once the minimum distance for each array is d=2,it is no longer possible to perform parity-exchange operations ordodging operations to increase the effective distance of the system.Consequently, two further failures can result in a data loss.

Over time, generalized dodging operations result in stripes no longerbeing local to a given array. Thus, the selection of a donor stripe maybe realized as selecting a donor element.

Once a spare unit becomes available, such as through maintenance, it canbe assigned to replace any of the failed units. It is preferred torebuild elements of the stripes with the smallest minimum distances ontothe spare unit. Consider the example of FIG. 12. Array 1200 includes tenstorage units D1-D10 and uses a (3+3) symmetric code. Array 1200 alsoincludes five stripes, indicated by the numerals 1-5. Each stripe hassix elements and has distance d=4. Three storage units, D1, D2 and D4are shown as failed, and an element of stripe 1 has been rebuilt througha dodging operation onto an element of stripe 5 on unit D 10, and anelement of stripe 4 has been rebuilt through a dodging operation onto asecond element of stripe 5 on unit D8. At this point, stripes 1, 2, 4and 5 all have distance d=2, and stripe 3 has distance d=3. If a spareis subsequently made available to array 1200, information from thestripes at distance d=2 should be rebuilt onto the spare. The elementsto rebuild onto the spare should be chosen from the set of elements notcurrently present in the stripe. For example, assume the element ofstripe 1 on unit D2 was rebuilt onto unit D10, and the element of stripe4 from unit D1 was rebuilt onto unit D8. The elements to be rebuilt ontothe spare cannot include these rebuilt elements of stripes 1 and 4. Oncethe elements to be rebuilt have been selected, the information isrebuilt onto the spare in a well-known manner.

The primary criterion for selecting a donor element is based onselecting a donor element that has the least impact on the donor stripereliability. A secondary criterion is based on selecting the storageelement having the least impact on performance, such as the element withthe most expensive redundancy calculation. In the example of FIG. 12,the elements of stripe 5 were chosen primarily because stripe 5 had thehighest distance, and secondarily because D8 and D10 had the mostexpensive parity calculations. D9 could not be chosen for rebuilding anelement of stripe 4, because D9 already contained an element of stripe4. The primary criterion for selecting the information to be rebuilt isbased on the information set that provides the greatest increase inreliability. A secondary criterion is to select the information set thatprovides the best array performance following the rebuild operation. Inthe example of FIG. 12, the elements of stripes 1 and 4 were chosen tobe rebuilt primarily based on their remaining distances, and secondarilybecause rebuilding the chosen elements to maximize performance followingthe dodging operation.

There is further important effect of generalized dodging. In the exampleof FIGS. 9 and 10, a generalized dodging operation was performed betweenarrays when the recipient stripe was at distance d=2, while an internaldodging operation was performed on the nine-unit array (FIGS. 6 and 7)when the recipient stripe was at distance d=1. Thus, two failures wouldbe required within the stripes involved in the external dodgingoperation for data loss during the external dodging operation.

Donating an element from a donor stripe to a recipient stripe requiresthat the storage system be able to assign elements from one stripe toanother stripe. When the donor and recipient stripes are connected to acommon array controller 101, as shown in FIG. 1 a, then controller 101can perform this operation internally. When the donor and recipientstripes are connected to separate controllers 151 and 152, as shown inFIG. 1 b, then controllers 151 and 152 exchange information. Forexample, the controllers could expose the individual drives overcommunication connection 160, such as in the manner of a Just a Bunch ofDisks (JBOD) array configuration. Alternatively, the controllers couldexchange information regarding the data to be read and written from thelocations on the storage units involved.

The technique of dodging has been described for anamorphic arrays.Dodging can be used with array types having a minimum distance d=3 ormore. Generalized dodging, though, works best with symmetric arrays.

Dodging can be used beyond simply increasing the minimum distance of astorage system. Many other factors may be included in determiningwhether to perform dodging and to choose donors and recipients. Forexample, the individual failure probabilities of components when theyare non-uniform, the combinations of failures that lead to data loss,and the effects on system performance may all be considered. In suchcases, the minimum distance of the system could remain unchangedfollowing dodging.

While the present invention has been described in terms of storagearrays formed from HDD storage units, the present invention isapplicable to storage systems formed from arrays of other memorydevices, such as Random Access Memory (RAM) storage devices (bothvolatile and non-volatile), optical storage devices, and tape storagedevices. Additionally, it is suitable to virtualized storage systems,such as arrays built out of network-attached storage. It is furtherapplicable to any redundant system in which there is some stateinformation that associates a redundant component to particular subsetof components, and that state information may be transferred using adonation operation.

Although the foregoing invention has been described in some detail forpurposes of clarity of understanding, it will be apparent that certainchanges and modifications may be practiced that is within the scope ofthe appended claims. Accordingly, the present embodiments are to beconsidered as illustrative and not restrictive, and the invention is notto be limited to the details given herein, but may be modified withinthe scope and equivalents of the appended claims.

1. A method for increasing an error tolerance of an array of m storageunits, the method comprising steps of: storing k stripes across thearray of m storage units, each stripe having a plurality of elements,each stripe forming an erasure or error correcting code having a minimumHamming distance d, and each respective element of a stripe being storedon a different storage unit; selecting an element in a donor stripe whena difference between a minimum distance of the donor stripe and aminimum distance of a recipient stripe is greater or equal to 2, theselected element being stored on a storage unit having no elements ofthe recipient stripe; and rebuilding a lost element of the recipientstripe on the selected element.
 2. The method according to claim 1,wherein the minimum Hamming distance of the recipient stripe is d≧2before the step of selecting the element in the donor stripe.
 3. Themethod according to claim 1, further comprising a step of indicating tothe storage units storing the donor stripe that the selected element hasbeen donated before the step of rebuilding the lost element of therecipient stripe on the selected element.
 4. The method according toclaim 1, wherein the array of m storage units is an anamorphic array,each stripe having n+r elements in which n is the number data elementsin the base array, r is the number of redundant elements in the basearray, m>n+r, jm=k(n+r), and j and k are integers.
 5. The methodaccording to claim 1, wherein the storage units are hard disk drives. 6.The method according to claim 1, wherein the storage units are RAMstorage devices.
 7. The method according to claim 1, wherein the donorstripe is further selected based on a minimal performance impact on thearray.
 8. The method according to claim 1, further comprising a step ofselecting the recipient information based on an improved performance ofthe array.
 9. The method according to claim 1, wherein the erasure orerror correcting code is a parity code.
 10. The method according toclaim 1, wherein the erasure or error correcting code is a Winogradcode.
 11. The method according to claim 1, wherein the erasure or errorcorrecting code is a symmetric code.
 12. The method according to claim1, wherein the erasure or error correcting code is a Reed-Solomon code.13. The method according to claim 1, wherein the erasure or errorcorrecting code is an EVENODD code.
 14. The method according to claim 1,wherein the erasure or error correcting code is a derivative of anEVENODD code.
 15. The method according to claim 1, wherein the arrayincludes redundancy based on a product of a plurality of erasure orerror correcting codes.
 16. The method according to claim 15, wherein atleast one of the erasure or error correcting codes is a parity code. 17.The method according to claim 15, wherein at least one of the erasure orerror correcting codes is a Winograd code.
 18. The method according toclaim 15, wherein at least one of the erasure or error correcting codeis a symmetric code.
 19. The method according to claim 15, wherein atleast one of the erasure or error correcting code is a Reed-Solomoncode.
 20. The method according to claim 15, wherein at least one of theerasure or error correcting code is an EVENODD code.
 21. The methodaccording to claim 15, wherein at least one of the erasure or errorcorrecting code is a derivative of an EVENODD code.
 22. The methodaccording to claim 1, wherein when an element in the donor stripe failsduring the step of rebuilding at least a portion of recipientinformation from the recipient stripe on the selected element, themethod further comprising steps of: terminating the step of rebuildingat least a portion of recipient information from the recipient stripe onthe selected element; selecting a second donor stripe from the pluralityof stripes when a difference between a minimum distance of the seconddonor stripe and a minimum distance of the second recipient stripe isgreater or equal to 2; selecting a donor element in the second donorstripe; and rebuilding at least a portion of lost recipient informationfrom the recipient stripe on the selected element in the second donorstripe.
 23. The method according to claim 1, wherein when a spareelement becomes available, the method further comprising a step ofassigning the spare element to a selected storage unit.
 24. A method ofincreasing the failure tolerance of an array of m storage units that isvulnerable to selected patterns of failures, comprising steps of:storing k stripes across the array of m storage units, each stripehaving a plurality of elements, each stripe forming an erasure or errorcorrecting code having a minimum Hamming distance d, and each respectiveelement of a stripe being stored on a different storage unit; subsequentto an element failure, selecting a recipient element; selecting anelement in a donor stripe such that a failure tolerance of the array isincreased following a rebuild operation; and rebuilding a lost elementof the recipient stripe on the selected element of the donor stripe. 25.The method according to claim 24, wherein the minimum Hamming distanceof the recipient stripe is d≧2 before the step of selecting the elementin the donor stripe.
 26. The method according to claim 24, wherein theminimum Hamming distance of the array is increased upon completion ofthe step of rebuilding.
 27. The method according to claim 24, whereinthe recipient element is selected based on a failure pattern of thearray.
 28. The method according to claim 24, wherein the donor elementis selected based on a predetermined target pattern.
 29. The methodaccording to claim 24, further comprising a step of indicating to thestorage units storing the donor stripe that the selected element hasbeen donated before the step of rebuilding the lost element of therecipient stripe on the selected element.
 30. The method according toclaim 24, wherein the array of m storage units is an anamorphic array,each stripe having n+r elements in which n is the number data elementsin the base array, r is the number of redundant elements in the basearray, m>n+r, jm=k(n+r), and j and k are integers.
 31. The methodaccording to claim 24, wherein the storage units are hard disk drives.32. The method according to claim 24, wherein the storage units are RAMstorage devices.
 33. The method according to claim 24, wherein theerasure or error correcting code is a parity code.
 34. The methodaccording to claim 24, wherein the erasure or error correcting code is aWinograd code.
 35. The method according to claim 24, wherein the erasureor error correcting code is a symmetric code.
 36. The method accordingto claim 24, wherein the erasure or error correcting code is aReed-Solomon code.
 37. The method according to claim 24, wherein theerasure or error correcting code is an EVENODD code.
 38. The methodaccording to claim 24, wherein the erasure or error correcting code is aderivative of an EVENODD code.
 39. The method according to claim 24,wherein the array of storage units includes redundancy based on aproduct of a plurality of erasure or error correcting codes.
 40. Themethod according to claim 39, wherein at least one of the erasure orerror correcting codes is a parity code.
 41. The method according toclaim 39, wherein at least one of the erasure or error correcting codesis a Winograd code.
 42. The method according to claim 39, wherein atleast one of the erasure or error correcting code is a symmetric code.43. The method according to claim 39, wherein at least one of theerasure or error correcting code is a Reed-Solomon code.
 44. The methodaccording to claim 39, wherein at least one of the erasure or errorcorrecting code is an EVENODD code.
 45. The method according to claim39, wherein at least one of the erasure or error correcting code is aderivative of an EVENODD code.
 46. A method for increasing an errortolerance of a storage system having plurality of arrays of storageunits, each array having m storage units, the method comprising stepsof: storing k stripes across each respective array of m storage units,each stripe having a plurality of elements, each stripe forming an erroror erasure correcting code having a minimum Hamming distance d, and eachrespective element of a stripe being stored on a different storage unitin the array; selecting an element in a donor stripe when a differencebetween a minimum distance of the donor stripe and a minimum distance ofa recipient stripe is greater or equal to 2, the selected element beingstored on a storage unit having no elements of the recipient stripe; andrebuilding a lost element of the recipient stripe on the selectedelement.
 47. The method according to claim 46, wherein the donor stripeis stored on an array that is different from the array of the recipientstripe.
 48. The method according to claim 46, wherein the donor stripeis stored on the same array as the recipient stripe.
 49. The methodaccording to claim 46, wherein the minimum Hamming distance of therecipient stripe is d≧2 before the step of selecting the element in thedonor stripe.
 50. The method according to claim 46, further comprising astep of indicating to the storage units storing the donor stripe thatthe selected element has been donated before the step of rebuilding thelost element of the recipient stripe on the selected element.
 51. Themethod according to claim 46, wherein each array of m storage units isan anamorphic array, each stripe having n+r elements in which n is thenumber data elements in the base array, r is the number of redundantelements in the base array, m>n+r, jm=k(n+r), and j and k are integers.52. The method according to claim 46, wherein the storage units are harddisk drives.
 53. The method according to claim 46, wherein the storageunits are RAM storage devices.
 54. The method according to claim 46,wherein the selected element of the donor stripe is further selectedbased on a minimal performance impact on the donor stripe.
 55. Themethod according to claim 46, wherein the donor stripe is furtherselected based on a minimal performance impact on the storage system.56. The method according to claim 46, further comprising a step ofselecting the recipient information based on an improved performance ofthe recipient stripe.
 57. The method according to claim 46, furthercomprising a step of selecting the recipient information based on animproved performance of the storage system.
 58. The method according toclaim 46, wherein the erasure or error correcting code is a parity code.59. The method according to claim 46, wherein the erasure or errorcorrecting code is a Winograd code.
 60. The method according to claim46, wherein the erasure or error correcting code is a symmetric code.61. The method according to claim 46, wherein the erasure or errorcorrecting code is a Reed-Solomon code.
 62. The method according toclaim 46, wherein the erasure or error correcting code is an EVENODDcode.
 63. The method according to claim 46, wherein the erasure or errorcorrecting code is a derivative of an EVENODD code.
 64. The methodaccording to claim 46, wherein the array includes redundancy based on aproduct of a plurality of erasure or error correcting codes.
 65. Themethod according to claim 64, wherein at least one of the erasure orerror correcting codes is a parity code.
 66. The method according toclaim 64, wherein at least one of the erasure or error correcting codesis a Winograd code.
 67. The method according to claim 64, wherein atleast one of the erasure or error correcting code is a symmetric code.68. The method according to claim 64, wherein at least one of theerasure or error correcting code is a Reed-Solomon code.
 69. The methodaccording to claim 64, wherein at least one of the erasure or errorcorrecting code is an EVENODD code.
 70. The method according to claim64, wherein at least one of the erasure or error correcting code is aderivative of an EVENODD code.
 71. The method according to claim 46,wherein when an element in the donor stripe fails during the step ofrebuilding at least a portion of recipient information from therecipient stripe on the selected element, the method further comprisingsteps of: terminating the step of rebuilding at least a portion ofrecipient information from the recipient stripe on the selected element;selecting a second donor stripe from the plurality of stripes when adifference between a minimum distance of the second donor stripe and aminimum distance of the second recipient stripe is greater or equal to2; selecting a donor element in the second donor stripe; and rebuildingat least a portion of lost recipient information from the recipientstripe on the selected element in the second donor stripe.
 72. Themethod according to claim 46, wherein when a spare element becomesavailable, the method further comprising a step of assigning the spareelement to a selected storage unit.
 73. A data storage system,comprising; an array of m storage units, k stripes being stored acrossthe array of m storage units, each stripe having a plurality ofelements, each stripe forming an erasure or error correcting code havinga minimum Hamming distance d, and each respective element of a stripebeing stored on a different storage unit; and a system array controllerselecting an element in a donor stripe when a difference between aminimum distance of the donor stripe and a minimum distance of arecipient stripe is greater or equal to 2, the selected element beingstored on a storage unit having no elements of the recipient stripe; thesystem array controller rebuilding a lost element of the recipientstripe on the selected element.
 74. The data storage system according toclaim 73, wherein the minimum Hamming distance of the recipient stripeis d≧2 before the system array controller selects the element in thedonor stripe.
 75. The data storage system according to claim 73, whereinthe system array controller indicates to the storage units storing thedonor stripe that the selected element has been donated before the lostelement of the recipient stripe is rebuilt on the selected element. 76.The data storage system according to claim 73, wherein the array of mstorage units is an anamorphic array, each stripe having n+r elements inwhich n is the number data elements in the base array, r is the numberof redundant elements in the base array, m>n+r, jm=k(n+r), and j and kare integers.
 77. The data storage system according to claim 73, whereinthe storage units are hard disk drives.
 78. The data storage systemaccording to claim 73, wherein the storage units are RAM storagedevices.
 79. The data storage system according to claim 73, wherein thesystem array controller selects the donor stripe further based on aminimal performance impact on the array.
 80. The data storage systemaccording to claim 73, wherein the system array controller selects therecipient information based on an improved performance of the array. 81.The data storage system according to claim 73, wherein the erasure orerror correcting code is a parity code.
 82. The data storage systemaccording to claim 73, wherein the erasure or error correcting code is aWinograd code.
 83. The data storage system according to claim 73,wherein the erasure or error correcting code is a symmetric code. 84.The data storage system according to claim 73, wherein the erasure orerror correcting code is a Reed-Solomon code.
 85. The data storagesystem according to claim 73, wherein the erasure or error correctingcode is an EVENODD code.
 86. The data storage system according to claim73, wherein the erasure or error correcting code is a derivative of anEVENODD code.
 87. The data storage system according to claim 73, whereinthe array includes redundancy based on a product of a plurality oferasure or error correcting codes.
 88. The data storage system accordingto claim 87, wherein at least one of the erasure or error correctingcodes is a parity code.
 89. The data storage system according to claim87, wherein at least one of the erasure or error correcting codes is aWinograd code.
 90. The data storage system according to claim 87,wherein at least one of the erasure or error correcting code is asymmetric code.
 91. The data storage system according to claim 87,wherein at least one of the erasure or error correcting code is aReed-Solomon code.
 92. The data storage system according to claim 87,wherein at least one of the erasure or error correcting code is anEVENODD code.
 93. The data storage system according to claim 87, whereinat least one of the erasure or error correcting code is a derivative ofan EVENODD code.
 94. The data storage system according to claim 73,wherein when an element in the donor stripe fails as the system arraycontroller is rebuilding at least a portion of recipient informationfrom the recipient stripe on the selected element, the system arraycontroller terminates rebuilding the recipient information from therecipient stripe on the selected element, selects a second donor stripefrom the plurality of stripes when a difference between a minimumdistance of the second donor stripe and a minimum distance of the secondrecipient stripe is greater or equal to 2, selects a donor element inthe second donor stripe, and rebuilds at least a portion of lostrecipient information from the recipient stripe on the selected elementin the second donor stripe.
 95. The data storage system according toclaim 73, wherein when a spare element becomes available, the systemarray controller assigns the spare element to a selected storage unit.96. A data storage system, comprising: an array of m storage units, kstripes being stored across the array of m storage units, each stripehaving a plurality of elements, each stripe forming an erasure or errorcorrecting code having a minimum Hamming distance d, and each respectiveelement of a stripe being stored on a different storage unit; and asystem array controller selecting a recipient element subsequent to anelement failure and selecting an element in a donor stripe such that afailure tolerance of the array is increased following a rebuildoperation, the system array controller rebuilding a lost element of therecipient stripe on the selected element of the donor stripe.
 97. Thedata storage system according to claim 96, wherein the minimum Hammingdistance of the recipient stripe is d≧2 before the system arraycontroller selects the element in the donor stripe.
 98. The data storagesystem according to claim 96, wherein the minimum Hamming distance ofthe array is increased upon completion of rebuilding the lost element ofthe recipient stripe on the selected element of the donor stripe. 99.The data storage system according to claim 96, wherein system arraycontroller selects the recipient element based on a failure pattern ofthe array.
 100. The data storage system according to claim 96, whereinsystem array controller selects the donor element based on apredetermined target pattern.
 101. The data storage system according toclaim 96, wherein the system array controller indicates to the storageunits storing the donor stripe that the selected element has beendonated before the lost element of the recipient stripe is rebuilt onthe selected element.
 102. The data storage system according to claim96, wherein the array of m storage units is an anamorphic array, eachstripe having n+r elements in which n is the number data elements in thebase array, r is the number of redundant elements in the base array,m>n+r, jm=k(n+r), and j and k are integers.
 103. The data storage systemaccording to claim 96, wherein the storage units are hard disk drives.104. The data storage system according to claim 96, wherein the storageunits are RAM storage devices.
 105. The data storage system according toclaim 96, wherein the erasure or error correcting code is a parity code.106. The data storage system according to claim 96, wherein the erasureor error correcting code is a Winograd code.
 107. The data storagesystem according to claim 96, wherein the erasure or error correctingcode is a symmetric code.
 108. The data storage system according toclaim 96, wherein the erasure or error correcting code is a Reed-Solomoncode.
 109. The data storage system according to claim 96, wherein theerasure or error correcting code is an EVENODD code.
 110. The datastorage system according to claim 96, wherein the erasure or errorcorrecting code is a derivative of an EVENODD code.
 111. The datastorage system according to claim 96, wherein the array of storage unitsincludes redundancy based on a product of a plurality of erasure orerror correcting codes.
 112. The data storage system according to claim111, wherein at least one of the erasure or error correcting codes is aparity code.
 113. The data storage system according to claim 111,wherein at least one of the erasure or error correcting codes is aWinograd code.
 114. The data storage system according to claim 111,wherein at least one of the erasure or error correcting code is asymmetric code.
 115. The data storage system according to claim 111,wherein at least one of the erasure or error correcting code is aReed-Solomon code.
 116. The data storage system according to claim 111,wherein at least one of the erasure or error correcting code is anEVENODD code.
 117. The data storage system according to claim 111,wherein at least one of the erasure or error correcting code is aderivative of an EVENODD code.
 118. A data storage system, comprising: aplurality of arrays of storage units, each array having m storage units,k stripes being stored across each respective array of m storage units,each stripe having a plurality of elements, each stripe forming an erroror erasure correcting code having a minimum Hamming distance d=n+1, andeach respective element of a stripe being stored on a different storageunit in the array; and a system array controller selecting an element ina donor stripe when a difference between a minimum distance of the donorstripe and a minimum distance of a recipient stripe is greater or equalto 2, the selected element being stored on a storage unit having noelements of the recipient stripe, the system array controller rebuildinga lost element of the recipient stripe on the selected element.
 119. Thedata storage system according to claim 118, wherein the donor stripe isstored on an array that is different from the array of the recipientstripe.
 120. The data storage system according to claim 118, wherein thedonor stripe is stored on the same array as the recipient stripe. 121.The data storage system according to claim 118, wherein the minimumHamming distance of the recipient stripe is d≧2 before the step ofselecting the element in the donor stripe.
 122. The data storage systemaccording to claim 118, wherein the system array controller indicates tothe storage units storing the donor stripe that the selected element hasbeen donated before the lost element of the recipient stripe is rebuilton the selected element.
 123. The data storage system according to claim118, wherein each array of m storage units is an anamorphic array, eachstripe having n+r elements in which n is the number data elements in thebase array, r is the number of redundant elements in the base array,m>n+r, jm=k(n+r), and j and k are integers.
 124. The data storage systemaccording to claim 118, wherein the storage units are hard disk drives.125. The data storage system according to claim 118, wherein the storageunits are RAM storage devices.
 126. The data storage system according toclaim 118, wherein the donor stripe is further selected based on aminimal performance impact on the donor stripe.
 127. The data storagesystem according to claim 118, wherein the selected element of the donorstripe is further selected based on a minimal performance impact on thestorage system.
 128. The data storage system according to claim 118,wherein the system array controller selects the recipient informationbased on an improved performance of the recipient stripe.
 129. The datastorage system according to claim 118, wherein the system arraycontroller selects the recipient information based on an improvedperformance of the storage system.
 130. The data storage systemaccording to claim 118, wherein the erasure or error correcting code isa parity code.
 131. The data storage system according to claim 118,wherein the erasure or error correcting code is a Winograd code. 132.The data storage system according to claim 118, wherein the erasure orerror correcting code is a symmetric code.
 133. The data storage systemaccording to claim 118, wherein the erasure or error correcting code isa Reed-Solomon code.
 134. The data storage system according to claim118, wherein the erasure or error correcting code is an EVENODD code.135. The data storage system according to claim 118, wherein the erasureor error correcting code is a derivative of an EVENODD code.
 136. Thedata storage system according to claim 118, wherein the array includesredundancy based on a product of a plurality of erasure or errorcorrecting codes.
 137. The data storage system according to claim 136,wherein at least one of the erasure or error correcting codes is aparity code.
 138. The data storage system according to claim 136,wherein at least one of the erasure or error correcting codes is aWinograd code.
 139. The data storage system according to claim 136,wherein at least one of the erasure or error correcting code is asymmetric code.
 140. The data storage system according to claim 136,wherein at least one of the erasure or error correcting code is aReed-Solomon code.
 141. The data storage system according to claim 136,wherein at least one of the erasure or error correcting code is anEVENODD code.
 142. The data storage system according to claim 136,wherein at least one of the erasure or error correcting code is aderivative of an EVENODD code.
 143. The data storage system according toclaim 118, wherein when an element in the donor stripe fails as thesystem array controller is rebuilding recipient information from therecipient stripe on the selected element, the system array controllerterminates rebuilding recipient information from the recipient stripe onthe selected element, selects a second donor stripe from the pluralityof stripes when a difference between a minimum distance of the seconddonor stripe and a minimum distance of the second recipient stripe isgreater or equal to 2, selects a donor element in the second donorstripe, and rebuilds at least a portion of lost recipient informationfrom the recipient stripe on the selected element in the second donorstripe.
 144. The data storage system according to claim 118, whereinwhen a spare element becomes available, the system array controllerassigns the spare element to a selected storage unit.